Overview
Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 101 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 63.6 KiB |
| Average record size in memory | 644.4 B |
Variable types
| Text | 1 |
|---|---|
| Numeric | 9 |
| Categorical | 9 |
| DateTime | 2 |
| Boolean | 1 |
Dataset
| Description | JHB_SCHARP_004 - Quality-corrected harmonized data |
|---|---|
| Creator | RP2 Clinical Data Quality Team |
| Author | Quality-Checked Data |
| URL | HEAT Research Projects |
Variable descriptions
| Age (at enrolment) | Patient age at study enrollment |
|---|---|
| CD4 cell count (cells/µL) | CD4+ T lymphocyte count (missing codes removed) |
| HIV viral load (copies/mL) | HIV RNA copies per mL (missing codes removed) |
| BMI (kg/m²) | Body Mass Index (extreme values removed) |
| Waist circumference (cm) | Waist circumference (corrected from mm to cm) |
| ALT (U/L) | Alanine aminotransferase (missing codes removed) |
| Platelet count (×10³/µL) | Platelet count (missing codes removed) |
| Hematocrit (%) | Hematocrit (zero values removed) |
| Lymphocyte count (×10⁹/L) | Lymphocyte absolute count (corrected labeling) |
| Neutrophil count (×10⁹/L) | Neutrophil absolute count (corrected labeling) |
| cd4_correction_applied | Quality flag: CD4 missing codes removed |
| final_comprehensive_fix_applied | Quality flag: Comprehensive corrections applied |
| waist_circ_unit_correction_applied | Quality flag: Waist circ unit corrected |
study_source has constant value "JHB_SCHARP_004" | Constant |
Sex has constant value "Male" | Constant |
latitude has constant value "-26.2041" | Constant |
longitude has constant value "28.03" | Constant |
province has constant value "Gauteng" | Constant |
city has constant value "Johannesburg" | Constant |
jhb_subregion has constant value "Soweto" | Constant |
cd4_correction_applied has constant value "0.0" | Constant |
final_comprehensive_fix_applied has constant value "1.0" | Constant |
waist_circ_unit_correction_applied has constant value "False" | Constant |
ALT (U/L) is highly overall correlated with AST (U/L) | High correlation |
AST (U/L) is highly overall correlated with ALT (U/L) | High correlation |
Hematocrit (%) is highly overall correlated with hemoglobin_g_dL | High correlation |
hemoglobin_g_dL is highly overall correlated with Hematocrit (%) | High correlation |
anonymous_patient_id has unique values | Unique |
Patient ID has unique values | Unique |
Reproduction
| Analysis started | 2025-11-24 21:49:42.221460 |
|---|---|
| Analysis finished | 2025-11-24 21:49:46.692082 |
| Duration | 4.47 seconds |
| Software version | ydata-profiling vv4.18.0 |
| Download configuration | config.json |
Variables
anonymous_patient_id
Text
Unique
| Distinct | 101 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.0 KiB |
Length
| Max length | 15 |
|---|---|
| Median length | 14 |
| Mean length | 14.188119 |
| Min length | 12 |
Unique
| Unique | 101 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | SCHARP_004_6 |
|---|---|
| 2nd row | SCHARP_004_10 |
| 3rd row | SCHARP_004_24 |
| 4th row | SCHARP_004_25 |
| 5th row | SCHARP_004_34 |
| Value | Count | Frequency (%) |
| scharp_004_6 | 1 | 1.0% |
| scharp_004_513 | 1 | 1.0% |
| scharp_004_24 | 1 | 1.0% |
| scharp_004_25 | 1 | 1.0% |
| scharp_004_34 | 1 | 1.0% |
| scharp_004_37 | 1 | 1.0% |
| scharp_004_39 | 1 | 1.0% |
| scharp_004_57 | 1 | 1.0% |
| scharp_004_69 | 1 | 1.0% |
| scharp_004_71 | 1 | 1.0% |
| Other values (91) | 91 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 216 | |
| _ | 202 | |
| 4 | 130 | |
| P | 101 | |
| C | 101 | |
| S | 101 | |
| R | 101 | |
| A | 101 | |
| H | 101 | |
| 1 | 67 | 4.7% |
| Other values (7) | 212 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 625 | |
| Uppercase Letter | 606 | |
| Connector Punctuation | 202 | 14.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 216 | |
| 4 | 130 | |
| 1 | 67 | 10.7% |
| 5 | 41 | 6.6% |
| 2 | 39 | 6.2% |
| 6 | 33 | 5.3% |
| 3 | 29 | 4.6% |
| 7 | 25 | 4.0% |
| 9 | 24 | 3.8% |
| 8 | 21 | 3.4% |
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 101 | |
| C | 101 | |
| S | 101 | |
| R | 101 | |
| A | 101 | |
| H | 101 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 202 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 827 | |
| Latin | 606 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 216 | |
| _ | 202 | |
| 4 | 130 | |
| 1 | 67 | 8.1% |
| 5 | 41 | 5.0% |
| 2 | 39 | 4.7% |
| 6 | 33 | 4.0% |
| 3 | 29 | 3.5% |
| 7 | 25 | 3.0% |
| 9 | 24 | 2.9% |
Latin
| Value | Count | Frequency (%) |
| P | 101 | |
| C | 101 | |
| S | 101 | |
| R | 101 | |
| A | 101 | |
| H | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1433 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 216 | |
| _ | 202 | |
| 4 | 130 | |
| P | 101 | |
| C | 101 | |
| S | 101 | |
| R | 101 | |
| A | 101 | |
| H | 101 | |
| 1 | 67 | 4.7% |
| Other values (7) | 212 |
Patient ID
Real number (ℝ)
Unique
| Distinct | 101 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 719.87129 |
| Minimum | 6 |
|---|---|
| Maximum | 1992 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 37 |
| Q1 | 235 |
| median | 533 |
| Q3 | 1140 |
| 95-th percentile | 1884 |
| Maximum | 1992 |
| Range | 1986 |
| Interquartile range (IQR) | 905 |
Descriptive statistics
| Standard deviation | 598.36806 |
|---|---|
| Coefficient of variation (CV) | 0.83121534 |
| Kurtosis | -0.6813023 |
| Mean | 719.87129 |
| Median Absolute Deviation (MAD) | 342 |
| Skewness | 0.77670919 |
| Sum | 72707 |
| Variance | 358044.33 |
| Monotonicity | Strictly increasing |
| Value | Count | Frequency (%) |
| 6 | 1 | 1.0% |
| 758 | 1 | 1.0% |
| 1107 | 1 | 1.0% |
| 1081 | 1 | 1.0% |
| 1059 | 1 | 1.0% |
| 1051 | 1 | 1.0% |
| 954 | 1 | 1.0% |
| 872 | 1 | 1.0% |
| 813 | 1 | 1.0% |
| 811 | 1 | 1.0% |
| Other values (91) | 91 |
| Value | Count | Frequency (%) |
| 6 | 1 | |
| 10 | 1 | |
| 24 | 1 | |
| 25 | 1 | |
| 34 | 1 | |
| 37 | 1 | |
| 39 | 1 | |
| 57 | 1 | |
| 69 | 1 | |
| 71 | 1 |
| Value | Count | Frequency (%) |
| 1992 | 1 | |
| 1952 | 1 | |
| 1941 | 1 | |
| 1923 | 1 | |
| 1917 | 1 | |
| 1884 | 1 | |
| 1873 | 1 | |
| 1850 | 1 | |
| 1751 | 1 | |
| 1750 | 1 |
study_source
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.0 KiB |
| JHB_SCHARP_004 |
|---|
Length
| Max length | 14 |
|---|---|
| Median length | 14 |
| Mean length | 14 |
| Min length | 14 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | JHB_SCHARP_004 |
|---|---|
| 2nd row | JHB_SCHARP_004 |
| 3rd row | JHB_SCHARP_004 |
| 4th row | JHB_SCHARP_004 |
| 5th row | JHB_SCHARP_004 |
Common Values
| Value | Count | Frequency (%) |
| JHB_SCHARP_004 | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| jhb_scharp_004 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| H | 202 | |
| _ | 202 | |
| 0 | 202 | |
| J | 101 | |
| B | 101 | |
| S | 101 | |
| C | 101 | |
| A | 101 | |
| R | 101 | |
| P | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 909 | |
| Decimal Number | 303 | 21.4% |
| Connector Punctuation | 202 | 14.3% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| H | 202 | |
| J | 101 | |
| B | 101 | |
| S | 101 | |
| C | 101 | |
| A | 101 | |
| R | 101 | |
| P | 101 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 202 | |
| 4 | 101 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 202 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 909 | |
| Common | 505 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| H | 202 | |
| J | 101 | |
| B | 101 | |
| S | 101 | |
| C | 101 | |
| A | 101 | |
| R | 101 | |
| P | 101 |
Common
| Value | Count | Frequency (%) |
| _ | 202 | |
| 0 | 202 | |
| 4 | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1414 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| H | 202 | |
| _ | 202 | |
| 0 | 202 | |
| J | 101 | |
| B | 101 | |
| S | 101 | |
| C | 101 | |
| A | 101 | |
| R | 101 | |
| P | 101 |
primary_date
Date
| Distinct | 2 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.6 KiB |
| Minimum | 2015-01-01 00:00:00 |
|---|---|
| Maximum | 2016-01-01 00:00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
visit_date
Date
| Distinct | 2 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.6 KiB |
| Minimum | 2015-01-01 00:00:00 |
|---|---|
| Maximum | 2016-01-01 00:00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
Age (at enrolment)
Real number (ℝ)
Patient age at study enrollment
| Distinct | 19 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 23.19802 |
| Minimum | 18 |
|---|---|
| Maximum | 42 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 20 |
| median | 22 |
| Q3 | 26 |
| 95-th percentile | 32 |
| Maximum | 42 |
| Range | 24 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 4.6883255 |
|---|---|
| Coefficient of variation (CV) | 0.20210025 |
| Kurtosis | 3.2769758 |
| Mean | 23.19802 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.5880952 |
| Sum | 2343 |
| Variance | 21.980396 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 21 | 13 | |
| 20 | 12 | |
| 18 | 11 | |
| 26 | 10 | |
| 19 | 10 | |
| 22 | 9 | |
| 24 | 8 | |
| 25 | 6 | |
| 23 | 6 | |
| 28 | 4 | 4.0% |
| Other values (9) | 12 |
| Value | Count | Frequency (%) |
| 18 | 11 | |
| 19 | 10 | |
| 20 | 12 | |
| 21 | 13 | |
| 22 | 9 | |
| 23 | 6 | |
| 24 | 8 | |
| 25 | 6 | |
| 26 | 10 | |
| 27 | 2 | 2.0% |
| Value | Count | Frequency (%) |
| 42 | 1 | 1.0% |
| 39 | 1 | 1.0% |
| 38 | 1 | 1.0% |
| 34 | 1 | 1.0% |
| 33 | 1 | 1.0% |
| 32 | 1 | 1.0% |
| 31 | 2 | |
| 29 | 2 | |
| 28 | 4 | |
| 27 | 2 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Male |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| M | 101 | |
| a | 101 | |
| l | 101 | |
| e | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 303 | |
| Uppercase Letter | 101 | 25.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 101 | |
| l | 101 | |
| e | 101 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 404 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| M | 101 | |
| a | 101 | |
| l | 101 | |
| e | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 404 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| M | 101 | |
| a | 101 | |
| l | 101 | |
| e | 101 |
latitude
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.4 KiB |
| -26.2041 |
|---|
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | -26.2041 |
|---|---|
| 2nd row | -26.2041 |
| 3rd row | -26.2041 |
| 4th row | -26.2041 |
| 5th row | -26.2041 |
Common Values
| Value | Count | Frequency (%) |
| -26.2041 | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 26.2041 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 202 | |
| - | 101 | |
| 6 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 4 | 101 | |
| 1 | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 606 | |
| Dash Punctuation | 101 | 12.5% |
| Other Punctuation | 101 | 12.5% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 202 | |
| 6 | 101 | |
| 0 | 101 | |
| 4 | 101 | |
| 1 | 101 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 101 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 808 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 202 | |
| - | 101 | |
| 6 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 4 | 101 | |
| 1 | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 808 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 202 | |
| - | 101 | |
| 6 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 4 | 101 | |
| 1 | 101 |
longitude
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.1 KiB |
| 28.03 |
|---|
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 28.03 |
|---|---|
| 2nd row | 28.03 |
| 3rd row | 28.03 |
| 4th row | 28.03 |
| 5th row | 28.03 |
Common Values
| Value | Count | Frequency (%) |
| 28.03 | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 28.03 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 101 | |
| 8 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 3 | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 404 | |
| Other Punctuation | 101 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 101 | |
| 8 | 101 | |
| 0 | 101 | |
| 3 | 101 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 505 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 101 | |
| 8 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 3 | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 505 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 101 | |
| 8 | 101 | |
| . | 101 | |
| 0 | 101 | |
| 3 | 101 |
province
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.3 KiB |
| Gauteng |
|---|
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 7 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Gauteng |
|---|---|
| 2nd row | Gauteng |
| 3rd row | Gauteng |
| 4th row | Gauteng |
| 5th row | Gauteng |
Common Values
| Value | Count | Frequency (%) |
| Gauteng | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| gauteng | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 101 | |
| a | 101 | |
| u | 101 | |
| t | 101 | |
| e | 101 | |
| n | 101 | |
| g | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 606 | |
| Uppercase Letter | 101 | 14.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 101 | |
| u | 101 | |
| t | 101 | |
| e | 101 | |
| n | 101 | |
| g | 101 |
Uppercase Letter
| Value | Count | Frequency (%) |
| G | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 707 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| G | 101 | |
| a | 101 | |
| u | 101 | |
| t | 101 | |
| e | 101 | |
| n | 101 | |
| g | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 707 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| G | 101 | |
| a | 101 | |
| u | 101 | |
| t | 101 | |
| e | 101 | |
| n | 101 | |
| g | 101 |
city
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.8 KiB |
| Johannesburg |
|---|
Length
| Max length | 12 |
|---|---|
| Median length | 12 |
| Mean length | 12 |
| Min length | 12 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Johannesburg |
|---|---|
| 2nd row | Johannesburg |
| 3rd row | Johannesburg |
| 4th row | Johannesburg |
| 5th row | Johannesburg |
Common Values
| Value | Count | Frequency (%) |
| Johannesburg | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| johannesburg | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 202 | |
| J | 101 | |
| o | 101 | |
| h | 101 | |
| a | 101 | |
| e | 101 | |
| s | 101 | |
| b | 101 | |
| u | 101 | |
| r | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1111 | |
| Uppercase Letter | 101 | 8.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 202 | |
| o | 101 | |
| h | 101 | |
| a | 101 | |
| e | 101 | |
| s | 101 | |
| b | 101 | |
| u | 101 | |
| r | 101 | |
| g | 101 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1212 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 202 | |
| J | 101 | |
| o | 101 | |
| h | 101 | |
| a | 101 | |
| e | 101 | |
| s | 101 | |
| b | 101 | |
| u | 101 | |
| r | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1212 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 202 | |
| J | 101 | |
| o | 101 | |
| h | 101 | |
| a | 101 | |
| e | 101 | |
| s | 101 | |
| b | 101 | |
| u | 101 | |
| r | 101 |
jhb_subregion
Categorical
Constant
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 6.2 KiB |
| Soweto |
|---|
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 6 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Soweto |
|---|---|
| 2nd row | Soweto |
| 3rd row | Soweto |
| 4th row | Soweto |
| 5th row | Soweto |
Common Values
| Value | Count | Frequency (%) |
| Soweto | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| soweto | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 202 | |
| S | 101 | |
| w | 101 | |
| e | 101 | |
| t | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 505 | |
| Uppercase Letter | 101 | 16.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 202 | |
| w | 101 | |
| e | 101 | |
| t | 101 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 606 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 202 | |
| S | 101 | |
| w | 101 | |
| e | 101 | |
| t | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 606 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 202 | |
| S | 101 | |
| w | 101 | |
| e | 101 | |
| t | 101 |
| Distinct | 70 |
|---|---|
| Distinct (%) | 69.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 46.6 |
| Minimum | 29.7 |
|---|---|
| Maximum | 54.7 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 29.7 |
|---|---|
| 5-th percentile | 41.8 |
| Q1 | 44.4 |
| median | 46.6 |
| Q3 | 48.6 |
| 95-th percentile | 51.3 |
| Maximum | 54.7 |
| Range | 25 |
| Interquartile range (IQR) | 4.2 |
Descriptive statistics
| Standard deviation | 3.4788216 |
|---|---|
| Coefficient of variation (CV) | 0.074652825 |
| Kurtosis | 4.4641552 |
| Mean | 46.6 |
| Median Absolute Deviation (MAD) | 2.2 |
| Skewness | -0.97576448 |
| Sum | 4706.6 |
| Variance | 12.1022 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48 | 4 | 4.0% |
| 46.9 | 3 | 3.0% |
| 46.2 | 3 | 3.0% |
| 50.9 | 3 | 3.0% |
| 43.6 | 3 | 3.0% |
| 42.6 | 3 | 3.0% |
| 45.5 | 3 | 3.0% |
| 48.3 | 3 | 3.0% |
| 50.8 | 3 | 3.0% |
| 44.5 | 2 | 2.0% |
| Other values (60) | 71 |
| Value | Count | Frequency (%) |
| 29.7 | 1 | 1.0% |
| 39.9 | 1 | 1.0% |
| 40.4 | 1 | 1.0% |
| 41.4 | 1 | 1.0% |
| 41.7 | 1 | 1.0% |
| 41.8 | 1 | 1.0% |
| 42.3 | 1 | 1.0% |
| 42.5 | 1 | 1.0% |
| 42.6 | 3 | |
| 42.8 | 1 | 1.0% |
| Value | Count | Frequency (%) |
| 54.7 | 1 | 1.0% |
| 53.7 | 1 | 1.0% |
| 53.3 | 1 | 1.0% |
| 52 | 1 | 1.0% |
| 51.4 | 1 | 1.0% |
| 51.3 | 1 | 1.0% |
| 51.2 | 1 | 1.0% |
| 50.9 | 3 | |
| 50.8 | 3 | |
| 50.4 | 1 | 1.0% |
Platelet count (×10³/µL)
Real number (ℝ)
Platelet count (missing codes removed)
| Distinct | 81 |
|---|---|
| Distinct (%) | 80.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 278.15842 |
| Minimum | 142 |
|---|---|
| Maximum | 438 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 142 |
|---|---|
| 5-th percentile | 201 |
| Q1 | 241 |
| median | 270 |
| Q3 | 315 |
| 95-th percentile | 374 |
| Maximum | 438 |
| Range | 296 |
| Interquartile range (IQR) | 74 |
Descriptive statistics
| Standard deviation | 54.837347 |
|---|---|
| Coefficient of variation (CV) | 0.1971443 |
| Kurtosis | 0.36896072 |
| Mean | 278.15842 |
| Median Absolute Deviation (MAD) | 38 |
| Skewness | 0.48248693 |
| Sum | 28094 |
| Variance | 3007.1347 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 266 | 4 | 4.0% |
| 234 | 2 | 2.0% |
| 294 | 2 | 2.0% |
| 311 | 2 | 2.0% |
| 232 | 2 | 2.0% |
| 261 | 2 | 2.0% |
| 253 | 2 | 2.0% |
| 248 | 2 | 2.0% |
| 211 | 2 | 2.0% |
| 265 | 2 | 2.0% |
| Other values (71) | 79 |
| Value | Count | Frequency (%) |
| 142 | 1 | |
| 179 | 1 | |
| 194 | 1 | |
| 195 | 1 | |
| 197 | 1 | |
| 201 | 1 | |
| 206 | 1 | |
| 209 | 2 | |
| 211 | 2 | |
| 214 | 1 |
| Value | Count | Frequency (%) |
| 438 | 1 | |
| 436 | 1 | |
| 397 | 1 | |
| 383 | 1 | |
| 377 | 1 | |
| 374 | 1 | |
| 373 | 1 | |
| 370 | 1 | |
| 357 | 1 | |
| 347 | 1 |
hemoglobin_g_dL
Real number (ℝ)
High correlation
| Distinct | 44 |
|---|---|
| Distinct (%) | 43.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.659406 |
| Minimum | 10.2 |
|---|---|
| Maximum | 18.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 10.2 |
|---|---|
| 5-th percentile | 13.8 |
| Q1 | 14.9 |
| median | 15.7 |
| Q3 | 16.6 |
| 95-th percentile | 17.8 |
| Maximum | 18.4 |
| Range | 8.2 |
| Interquartile range (IQR) | 1.7 |
Descriptive statistics
| Standard deviation | 1.316676 |
|---|---|
| Coefficient of variation (CV) | 0.084082116 |
| Kurtosis | 2.114995 |
| Mean | 15.659406 |
| Median Absolute Deviation (MAD) | 0.9 |
| Skewness | -0.71216193 |
| Sum | 1581.6 |
| Variance | 1.7336356 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 15.7 | 7 | 6.9% |
| 15.8 | 6 | 5.9% |
| 16.2 | 5 | 5.0% |
| 16.5 | 4 | 4.0% |
| 15.2 | 4 | 4.0% |
| 16.6 | 4 | 4.0% |
| 16 | 3 | 3.0% |
| 14.4 | 3 | 3.0% |
| 15.5 | 3 | 3.0% |
| 14.2 | 3 | 3.0% |
| Other values (34) | 59 |
| Value | Count | Frequency (%) |
| 10.2 | 1 | 1.0% |
| 12.3 | 1 | 1.0% |
| 13.2 | 1 | 1.0% |
| 13.3 | 1 | 1.0% |
| 13.4 | 1 | 1.0% |
| 13.8 | 1 | 1.0% |
| 13.9 | 2 | |
| 14 | 1 | 1.0% |
| 14.1 | 1 | 1.0% |
| 14.2 | 3 |
| Value | Count | Frequency (%) |
| 18.4 | 1 | 1.0% |
| 18.3 | 1 | 1.0% |
| 18.1 | 1 | 1.0% |
| 17.9 | 2 | |
| 17.8 | 1 | 1.0% |
| 17.6 | 1 | 1.0% |
| 17.4 | 2 | |
| 17.1 | 2 | |
| 17 | 3 | |
| 16.9 | 3 |
Lymphocyte count (×10⁹/L)
Real number (ℝ)
Lymphocyte absolute count (corrected labeling)
| Distinct | 78 |
|---|---|
| Distinct (%) | 77.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.0021782 |
| Minimum | 0.77 |
|---|---|
| Maximum | 4.31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 0.77 |
|---|---|
| 5-th percentile | 1.05 |
| Q1 | 1.56 |
| median | 1.95 |
| Q3 | 2.36 |
| 95-th percentile | 3.09 |
| Maximum | 4.31 |
| Range | 3.54 |
| Interquartile range (IQR) | 0.8 |
Descriptive statistics
| Standard deviation | 0.63511984 |
|---|---|
| Coefficient of variation (CV) | 0.31721444 |
| Kurtosis | 0.89462595 |
| Mean | 2.0021782 |
| Median Absolute Deviation (MAD) | 0.4 |
| Skewness | 0.6877214 |
| Sum | 202.22 |
| Variance | 0.40337721 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.72 | 3 | 3.0% |
| 2.41 | 3 | 3.0% |
| 2.35 | 3 | 3.0% |
| 1.56 | 3 | 3.0% |
| 1.63 | 3 | 3.0% |
| 1.54 | 2 | 2.0% |
| 2.36 | 2 | 2.0% |
| 1.67 | 2 | 2.0% |
| 1.05 | 2 | 2.0% |
| 1.88 | 2 | 2.0% |
| Other values (68) | 76 |
| Value | Count | Frequency (%) |
| 0.77 | 1 | |
| 0.96 | 1 | |
| 0.97 | 1 | |
| 1 | 1 | |
| 1.05 | 2 | |
| 1.12 | 1 | |
| 1.13 | 1 | |
| 1.16 | 1 | |
| 1.19 | 1 | |
| 1.2 | 1 |
| Value | Count | Frequency (%) |
| 4.31 | 1 | |
| 3.67 | 1 | |
| 3.33 | 1 | |
| 3.13 | 1 | |
| 3.1 | 1 | |
| 3.09 | 1 | |
| 3.05 | 1 | |
| 2.99 | 1 | |
| 2.95 | 1 | |
| 2.81 | 1 |
Neutrophil count (×10⁹/L)
Real number (ℝ)
Neutrophil absolute count (corrected labeling)
| Distinct | 91 |
|---|---|
| Distinct (%) | 90.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.0844554 |
| Minimum | 1.2 |
|---|---|
| Maximum | 9.68 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 1.2 |
|---|---|
| 5-th percentile | 1.47 |
| Q1 | 1.99 |
| median | 2.67 |
| Q3 | 3.77 |
| 95-th percentile | 6.22 |
| Maximum | 9.68 |
| Range | 8.48 |
| Interquartile range (IQR) | 1.78 |
Descriptive statistics
| Standard deviation | 1.5919375 |
|---|---|
| Coefficient of variation (CV) | 0.51611622 |
| Kurtosis | 3.2682385 |
| Mean | 3.0844554 |
| Median Absolute Deviation (MAD) | 0.85 |
| Skewness | 1.6363613 |
| Sum | 311.53 |
| Variance | 2.534265 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 5.38 | 3 | 3.0% |
| 2.73 | 3 | 3.0% |
| 1.68 | 3 | 3.0% |
| 2.1 | 2 | 2.0% |
| 1.99 | 2 | 2.0% |
| 2.67 | 2 | 2.0% |
| 2.19 | 2 | 2.0% |
| 3.31 | 1 | 1.0% |
| 2.21 | 1 | 1.0% |
| 3.58 | 1 | 1.0% |
| Other values (81) | 81 |
| Value | Count | Frequency (%) |
| 1.2 | 1 | |
| 1.36 | 1 | |
| 1.37 | 1 | |
| 1.39 | 1 | |
| 1.46 | 1 | |
| 1.47 | 1 | |
| 1.48 | 1 | |
| 1.53 | 1 | |
| 1.55 | 1 | |
| 1.56 | 1 |
| Value | Count | Frequency (%) |
| 9.68 | 1 | 1.0% |
| 8.35 | 1 | 1.0% |
| 7.25 | 1 | 1.0% |
| 7.05 | 1 | 1.0% |
| 6.88 | 1 | 1.0% |
| 6.22 | 1 | 1.0% |
| 5.62 | 1 | 1.0% |
| 5.38 | 3 | |
| 4.98 | 1 | 1.0% |
| 4.91 | 1 | 1.0% |
| Distinct | 38 |
|---|---|
| Distinct (%) | 37.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22.693069 |
| Minimum | 6 |
|---|---|
| Maximum | 157 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 6 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 11 |
| median | 17 |
| Q3 | 23 |
| 95-th percentile | 57 |
| Maximum | 157 |
| Range | 151 |
| Interquartile range (IQR) | 12 |
Descriptive statistics
| Standard deviation | 20.74596 |
|---|---|
| Coefficient of variation (CV) | 0.91419806 |
| Kurtosis | 18.64885 |
| Mean | 22.693069 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | 3.7291486 |
| Sum | 2292 |
| Variance | 430.39485 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11 | 9 | 8.9% |
| 10 | 9 | 8.9% |
| 13 | 7 | 6.9% |
| 18 | 6 | 5.9% |
| 15 | 6 | 5.9% |
| 19 | 6 | 5.9% |
| 9 | 5 | 5.0% |
| 16 | 5 | 5.0% |
| 21 | 4 | 4.0% |
| 12 | 4 | 4.0% |
| Other values (28) | 40 |
| Value | Count | Frequency (%) |
| 6 | 2 | 2.0% |
| 8 | 1 | 1.0% |
| 9 | 5 | |
| 10 | 9 | |
| 11 | 9 | |
| 12 | 4 | |
| 13 | 7 | |
| 14 | 2 | 2.0% |
| 15 | 6 | |
| 16 | 5 |
| Value | Count | Frequency (%) |
| 157 | 1 | |
| 93 | 1 | |
| 89 | 1 | |
| 63 | 1 | |
| 61 | 1 | |
| 57 | 1 | |
| 53 | 1 | |
| 52 | 1 | |
| 49 | 1 | |
| 47 | 1 |
AST (U/L)
Real number (ℝ)
High correlation
| Distinct | 29 |
|---|---|
| Distinct (%) | 28.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 25.217822 |
| Minimum | 14 |
|---|---|
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.6 KiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 15 |
| Q1 | 19 |
| median | 23 |
| Q3 | 27 |
| 95-th percentile | 47 |
| Maximum | 100 |
| Range | 86 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 11.34866 |
|---|---|
| Coefficient of variation (CV) | 0.45002538 |
| Kurtosis | 18.547589 |
| Mean | 25.217822 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 3.469433 |
| Sum | 2547 |
| Variance | 128.79208 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 23 | 9 | 8.9% |
| 20 | 9 | 8.9% |
| 19 | 9 | 8.9% |
| 25 | 8 | 7.9% |
| 17 | 6 | 5.9% |
| 21 | 5 | 5.0% |
| 15 | 5 | 5.0% |
| 16 | 5 | 5.0% |
| 26 | 5 | 5.0% |
| 27 | 4 | 4.0% |
| Other values (19) | 36 |
| Value | Count | Frequency (%) |
| 14 | 2 | 2.0% |
| 15 | 5 | |
| 16 | 5 | |
| 17 | 6 | |
| 18 | 4 | |
| 19 | 9 | |
| 20 | 9 | |
| 21 | 5 | |
| 22 | 3 | 3.0% |
| 23 | 9 |
| Value | Count | Frequency (%) |
| 100 | 1 | |
| 54 | 1 | |
| 53 | 1 | |
| 49 | 1 | |
| 48 | 1 | |
| 47 | 1 | |
| 41 | 1 | |
| 40 | 1 | |
| 39 | 2 | |
| 38 | 2 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.9 KiB |
| 0.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0.0 |
|---|---|
| 2nd row | 0.0 |
| 3rd row | 0.0 |
| 4th row | 0.0 |
| 5th row | 0.0 |
Common Values
| Value | Count | Frequency (%) |
| 0.0 | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0.0 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 202 | |
| . | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 202 | |
| Other Punctuation | 101 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 202 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 303 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 202 | |
| . | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 303 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 202 | |
| . | 101 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 5.9 KiB |
| 1.0 |
|---|
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1.0 |
|---|---|
| 2nd row | 1.0 |
| 3rd row | 1.0 |
| 4th row | 1.0 |
| 5th row | 1.0 |
Common Values
| Value | Count | Frequency (%) |
| 1.0 | 101 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 1.0 | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 101 | |
| . | 101 | |
| 0 | 101 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 202 | |
| Other Punctuation | 101 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 101 | |
| 0 | 101 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 101 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 303 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 101 | |
| . | 101 | |
| 0 | 101 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 303 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 101 | |
| . | 101 | |
| 0 | 101 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 909.0 B |
| False |
|---|
| Value | Count | Frequency (%) |
| False | 101 |
Interactions
Correlations
| ALT (U/L) | AST (U/L) | Age (at enrolment) | Hematocrit (%) | Lymphocyte count (×10⁹/L) | Neutrophil count (×10⁹/L) | Patient ID | Platelet count (×10³/µL) | hemoglobin_g_dL | |
|---|---|---|---|---|---|---|---|---|---|
| ALT (U/L) | 1.000 | 0.692 | 0.165 | 0.038 | 0.067 | 0.244 | -0.143 | 0.075 | 0.059 |
| AST (U/L) | 0.692 | 1.000 | 0.149 | -0.008 | -0.109 | 0.225 | -0.186 | 0.076 | -0.021 |
| Age (at enrolment) | 0.165 | 0.149 | 1.000 | -0.131 | 0.022 | 0.076 | -0.186 | -0.153 | -0.140 |
| Hematocrit (%) | 0.038 | -0.008 | -0.131 | 1.000 | 0.040 | 0.154 | -0.043 | 0.107 | 0.913 |
| Lymphocyte count (×10⁹/L) | 0.067 | -0.109 | 0.022 | 0.040 | 1.000 | 0.065 | 0.088 | 0.244 | -0.019 |
| Neutrophil count (×10⁹/L) | 0.244 | 0.225 | 0.076 | 0.154 | 0.065 | 1.000 | -0.074 | 0.183 | 0.102 |
| Patient ID | -0.143 | -0.186 | -0.186 | -0.043 | 0.088 | -0.074 | 1.000 | -0.051 | -0.011 |
| Platelet count (×10³/µL) | 0.075 | 0.076 | -0.153 | 0.107 | 0.244 | 0.183 | -0.051 | 1.000 | 0.020 |
| hemoglobin_g_dL | 0.059 | -0.021 | -0.140 | 0.913 | -0.019 | 0.102 | -0.011 | 0.020 | 1.000 |
Missing values
Sample
| anonymous_patient_id | Patient ID | study_source | primary_date | visit_date | Age (at enrolment) | Sex | latitude | longitude | province | city | jhb_subregion | Hematocrit (%) | Platelet count (×10³/µL) | hemoglobin_g_dL | Lymphocyte count (×10⁹/L) | Neutrophil count (×10⁹/L) | ALT (U/L) | AST (U/L) | cd4_correction_applied | final_comprehensive_fix_applied | waist_circ_unit_correction_applied | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6128 | SCHARP_004_6 | 6 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 28.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 46.9 | 322.0 | 15.7 | 1.90 | 3.31 | 46.0 | 25.0 | 0.0 | 1.0 | False |
| 6129 | SCHARP_004_10 | 10 | JHB_SCHARP_004 | 2016-01-01 00:00:00 | 2016-01-01 00:00:00 | 26.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 43.6 | 195.0 | 14.9 | 1.05 | 5.38 | 12.0 | 23.0 | 0.0 | 1.0 | False |
| 6130 | SCHARP_004_24 | 24 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 27.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 50.8 | 269.0 | 16.9 | 2.18 | 1.84 | 61.0 | 54.0 | 0.0 | 1.0 | False |
| 6131 | SCHARP_004_25 | 25 | JHB_SCHARP_004 | 2016-01-01 00:00:00 | 2016-01-01 00:00:00 | 19.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 45.9 | 275.0 | 14.9 | 3.67 | 2.33 | 15.0 | 23.0 | 0.0 | 1.0 | False |
| 6132 | SCHARP_004_34 | 34 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 31.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 49.3 | 290.0 | 16.5 | 2.04 | 2.61 | 15.0 | 25.0 | 0.0 | 1.0 | False |
| 6133 | SCHARP_004_37 | 37 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 23.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 46.0 | 257.0 | 15.8 | 2.74 | 1.99 | 23.0 | 18.0 | 0.0 | 1.0 | False |
| 6134 | SCHARP_004_39 | 39 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 22.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 48.8 | 224.0 | 17.0 | 2.77 | 2.38 | 22.0 | 24.0 | 0.0 | 1.0 | False |
| 6135 | SCHARP_004_57 | 57 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 22.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 45.9 | 246.0 | 15.2 | 1.78 | 3.92 | 25.0 | 25.0 | 0.0 | 1.0 | False |
| 6136 | SCHARP_004_69 | 69 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 24.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 44.5 | 289.0 | 14.8 | 2.34 | 6.22 | 32.0 | 47.0 | 0.0 | 1.0 | False |
| 6137 | SCHARP_004_71 | 71 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 26.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 48.0 | 241.0 | 15.8 | 1.62 | 1.68 | 21.0 | 26.0 | 0.0 | 1.0 | False |
| anonymous_patient_id | Patient ID | study_source | primary_date | visit_date | Age (at enrolment) | Sex | latitude | longitude | province | city | jhb_subregion | Hematocrit (%) | Platelet count (×10³/µL) | hemoglobin_g_dL | Lymphocyte count (×10⁹/L) | Neutrophil count (×10⁹/L) | ALT (U/L) | AST (U/L) | cd4_correction_applied | final_comprehensive_fix_applied | waist_circ_unit_correction_applied | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6219 | SCHARP_004_1750 | 1750 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 24.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 43.8 | 197.0 | 14.2 | 2.35 | 1.66 | 13.0 | 23.0 | 0.0 | 1.0 | False |
| 6220 | SCHARP_004_1751 | 1751 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 21.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 46.9 | 346.0 | 16.5 | 2.38 | 4.74 | 28.0 | 26.0 | 0.0 | 1.0 | False |
| 6221 | SCHARP_004_1850 | 1850 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 21.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 46.3 | 305.0 | 15.9 | 2.09 | 1.77 | 13.0 | 21.0 | 0.0 | 1.0 | False |
| 6222 | SCHARP_004_1873 | 1873 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 20.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 47.4 | 278.0 | 15.6 | 2.64 | 2.84 | 12.0 | 20.0 | 0.0 | 1.0 | False |
| 6223 | SCHARP_004_1884 | 1884 | JHB_SCHARP_004 | 2016-01-01 00:00:00 | 2016-01-01 00:00:00 | 18.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 53.7 | 310.0 | 18.3 | 1.61 | 1.72 | 22.0 | 25.0 | 0.0 | 1.0 | False |
| 6224 | SCHARP_004_1917 | 1917 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 20.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 50.9 | 270.0 | 17.1 | 1.98 | 8.35 | 18.0 | 22.0 | 0.0 | 1.0 | False |
| 6225 | SCHARP_004_1923 | 1923 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 21.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 43.2 | 245.0 | 13.8 | 2.81 | 2.60 | 28.0 | 28.0 | 0.0 | 1.0 | False |
| 6226 | SCHARP_004_1941 | 1941 | JHB_SCHARP_004 | 2016-01-01 00:00:00 | 2016-01-01 00:00:00 | 18.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 48.9 | 316.0 | 16.9 | 1.56 | 3.79 | 52.0 | 32.0 | 0.0 | 1.0 | False |
| 6227 | SCHARP_004_1952 | 1952 | JHB_SCHARP_004 | 2015-01-01 00:00:00 | 2015-01-01 00:00:00 | 26.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 40.4 | 315.0 | 13.2 | 2.43 | 2.07 | 17.0 | 29.0 | 0.0 | 1.0 | False |
| 6228 | SCHARP_004_1992 | 1992 | JHB_SCHARP_004 | 2016-01-01 00:00:00 | 2016-01-01 00:00:00 | 26.0 | Male | -26.2041 | 28.03 | Gauteng | Johannesburg | Soweto | 50.4 | 238.0 | 16.7 | 2.61 | 2.05 | 10.0 | 19.0 | 0.0 | 1.0 | False |